moral enhancement
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
Galatolo, Alessio, Rappuoli, Luca Alberto, Winkle, Katie, Beloucif, Meriem
The recent rise in popularity of large language models (LLMs) has prompted considerable concerns about their moral capabilities. Although considerable effort has been dedicated to aligning LLMs with human moral values, existing benchmarks and evaluations remain largely superficial, typically measuring alignment based on final ethical verdicts rather than explicit moral reasoning. In response, this paper aims to advance the investigation of LLMs' moral capabilities by examining their capacity to function as Artificial Moral Assistants (AMAs), systems envisioned in the philosophical literature to support human moral deliberation. We assert that qualifying as an AMA requires more than what state-of-the-art alignment techniques aim to achieve: not only must AMAs be able to discern ethically problematic situations, they should also be able to actively reason about them, navigating between conflicting values outside of those embedded in the alignment phase. Building on existing philosophical literature, we begin by designing a new formal framework of the specific kind of behaviour an AMA should exhibit, individu-ating key qualities such as deductive and abductive moral reasoning. Drawing on this theoretical framework, we develop a benchmark to test these qualities and evaluate popular open LLMs against it. Our results reveal considerable variability across models and highlight persistent shortcomings, particularly regarding abductive moral reasoning. Our work connects theoretical philosophy with practical AI evaluation while also emphasising the need for dedicated strategies to explicitly enhance moral reasoning capabilities in LLMs.
Experts say using 'morality drugs' is a 'terrible idea'
Scientists have long dabbled with the idea that'moral enhancement technologies,' such as drugs or even surgical techniques, could be used to improve human behaviour. But, experts now warn that such methods are not only infeasible, but are generally a'really bad idea.' While certain types of pharmaceutical or neurostimulation intervention may produce some positive effects, a new study has found that they are just'blunt instruments' with inconsistent results – and historically, such practices have proven unwise. Scientists have long dabbled with the idea that'moral enhancement technologies,' such as drugs or even surgical techniques, could be used to improve human behaviour. The researchers looked at the use of oxytocin (also known as the'moral molecule) – a neuropeptide that plays a critical role in social cognition and bonding.